FFT operating apparatus of programmable processors and operation method thereof

ABSTRACT

A fast Fourier transform (FFT) operating apparatus and a method thereof operate the FFT corresponding to a kernel function unit of DMT (Discrete Multitone) and OFDM (Orthogonal Frequency Division Multiplexing) modems to transmit a data with high speed in a programmable processor capable of processing a high speed telecommunication algorithm in real-time by adopting advantages of the on-demand semiconductor based system and the programmable processor and applying to various standards by securing the design flexibility of the system.

CROSS-REFERENCE TO RELATED APPLICATIONS

This application claims the benefit of Korean Patent Application No.2002-78393 filed Dec. 10, 2003, in the Korean Intellectual PropertyOffice, the disclosure of which is incorporated herein by reference.

BACKGROUND

1. Field of the Invention

The present invention relates to a fast Fourier transform (FFT)operating apparatus and an operation method thereof. More particularly,in a programmable processor which can be used in a variety of standardsand enable processing of high speed telecommunication algorithms inreal-time basis and also guarantee flexibility in system design, thepresent invention relates to a FFT operating apparatus and a methodthereof for carrying out FFT operation which is the kernel function ofDMT (Discrete MultiTone) and OFDM (Orthogonal Frequency DivisionMultiplexing) modems.

2. Description of the Related Art

Generally, fast Fourier transform (FFT) is used in a variety of fieldsof communication systems such as the asymmetric digital subscriber line(ADSL), the wireless asynchronous transfer mode (ATM), the shortdistance wireless communication network, and the applications such as amatched filter, a spectrum analysis, and a radar. The FFT is especiallyrequired for the establishment of the OFDM, i.e., the next-generationhigh speed telecommunication algorithm. The FFT is the algorithm thattransforms signal in time domain into frequency domain. Because the FFTcan reduce the operations required for the Discrete Fourier Transform(DFT) significantly by using the periodicity of trigonometric function,operations can be carried out with efficiency. The DFT can be expressedby the following formula 1: $\begin{matrix}\begin{matrix}{{X(k)} = {\sum\limits_{n = 0}^{N - 1}{{x(n)}\quad w_{N}^{k\quad n}}}} \\{{k = 0},1,\ldots\quad,{N - 1}} \\{w_{N}^{kn} = {\mathbb{e}}^{{- j}\quad 2\quad\pi\quad{{nk}/N}}}\end{matrix} & \lbrack {{Formula}\quad 1} \rbrack\end{matrix}$

By re-arranging x(n) of the formula 1 into odd-numbered andeven-numbered samples, respectively, N-point DFT being divided into twoN/2 DTFs can be expressed as the following formula 2: $\begin{matrix}\begin{matrix}{{X(k)} = {\sum\limits_{n = 0}^{N - 1}{{x(n)}w_{N}^{nk}}}} \\{= {{\sum\limits_{{n = 0},{even}}^{N - 1}{{x(n)}\quad w_{N}^{nk}}} + {\sum\limits_{{n = 0},{odd}}^{N - 1}{{x(n)}w_{N}^{nk}}}}} \\{= {{\sum\limits_{l = 0}^{{N/2} - 1}{{x( {2l} )}w_{N}^{2{lk}}}} + {\sum\limits_{l = 0}^{{N/2} - 1}{{x( {{2l} + 1} )}w_{N}^{{({{2l} + 1})}k}}}}} \\{= {{\sum\limits_{n = 0}^{{N/2} - 1}{{x( {2n} )}w_{N/2}^{nk}}} + {\sum\limits_{n = 0}^{{N/2} - 1}{{x( {{2n} + 1} )}\quad w_{N/2}^{{({{2n} + 1})}k}}}}}\end{matrix} & \lbrack {{Formula}\quad 2} \rbrack\end{matrix}$

As the formula 2 is repeated, the N-point DFT is divided into several2-point DFTs, and this process is referred to as the radix-2 DIT(Decimation-in-Time) FFT.

Among the methods to split the DFT of formula 1, radix-2 and radix4 DITFFTs are the most frequently used for the implementation.

The radix-2 DIT FFT is split into odd-numbered and even-numbered samplesas in the formula 2, while the radix-4 DIT FFT is split into four sets.Between these two FFTs, the radix-2 DIT FFT has a simpler butterflystructure, and thus requires less number of multipliers and space.However, the number of stages increases in the radix-2 DIT FFT, and thusit consumes much more operation cycles compared to the radix-4 DIT FFT.The radix-4 DIT FFT can enable high speed processing, too, but it has acomplicated butterfly structure and increases the number of multipliers.Also, operations for butterfly input data and addresses are complicated,which are quite hard to implement. Additionally, as the FFT having 4^(n)length is performed, the radix-4 DIT FFT has to be used in combinationwith the radix-2 DIT FFT for the FFT having a 2^(n) length.

Further, the FFT is divided into DIT (Decimation-In-Time) FFT and DIF(Decimation-In-Frequency) FFT according to whether the dividing is basedon time domain or frequency domain. The formula 2, which is divided withrespect to time domain, is categorized into the DIT FFT. If the dividingis performed with respect to X(k) in the frequency domain, it can becategorized into the DIF FFT.

In the digital signal processor, it is the DIT FFT usually used for FFT.While the DFT FFT adopts the configuration of performingaddition/subtraction and then multiplication, the DIT FFT, as shown inFIG. 1, adopts the configuration of performing multiplication and thenaddition/subtraction. Accordingly, for the digital signal processorbased on a multiplier-accumulator, the DIT FFT is more suitable foroperations.

For example, the DSP 56600 core is a fixed-point digital signalprocessor which consists of one 16×16 multiplier-accumulator (MAC) andone 40-bit ALU (arithmetic and logic unit), and carries out radix-2complex FFT butterfly operation using two parallel shift instructions.Since the DSP 56600 core has the configuration of a singlemultiplier-accumulator, it has a wide area, however, with less operationefficiency compared with the configuration of a dualmultiplier-accumulator. It takes 8N+9 cycles in order for the DSP 56600core to perform N radix-2 complex FFT butterfly operations.

FIG. 2 shows another example of an operator using the DIT FFT,especially showing a Carmel™ DSP core by Infineon Technologies AG. TheCarmel™ DSP core is a 16 bit fixed-point decimation core, which includestwo multiplexers 11, 11′ to select values for a data memory, two latchregisters 12, 12′ to store selected outputs from the multiplexers 11,11′, data bus switches 13, 13′ to switch data such as result of dataoperation at the data memory to input to a corresponding operator inaccordance with a desired operation, two registers 14, 14′ storing datafor input to the next-stage multiplier-accumulator, a first arithmeticunit 15 having a 16×16 MAC, a 40-bit ALU, and an exponenter and ashifter for a block fixed point operation, a second arithmetic unit 16having a 16×16 MAC and a 40-bit ALU, and an accumulator bank 17 toaccumulate and store results operated in the first and second arithmeticunit 15, 16 and switched by the data bus switches 13, 13′. The Carmel™DSP core, which adopts a CLIW (Configurable Long Instruction Word)architecture, can carry out up to 6 operations including 2 parallel datashifts in a single cycle. Also, as the Carmel™ DSP core supports anautomatic scaling mode, an overflow generated in the FFT operations canbe handled without having to use an additional cycle. However, theCarmel™ DSP core has a complex hardware configuration since the Carmel™DSP core is designed in the CLIW architecture to allow the parallelprocessing of the operations. To carry out N radix-2 complex FFTbutterfly operations by using the Carmel™ DSP core, 2N+2 cycles arerequired.

FIG. 3 shows yet another example of an operator using the DIT FFT,especially showing a Starcore™ SC140 operator. The SC140 applying a VLIW(Very Long Instruction Word) architecture includes two data memory buses21, 21′ to send/receive data to and from the data memory, 8shifter/limiters 22 to shift or limit the operated data stored in thedata register and load the data to the data memory buses 21, 21′, andfour 40-bit ALUs 24, 25,26, 27. As each of the ALUs 24, 25, 26, 27 has aMAC, it is possible to carry out up to four MAC operations or ALUoperations in a single cycle. As a result, using the four MACs, the FFToperations are carried out in a less operation cycle than the digitalsignal processor having a single or dual MAC.

However, the Starcore™ SC140 has a large size and consumes lots of powerdue to the integration of lots of the operation components. Further, itis difficult to efficiently allot the operation components due to thedata dependency and to read or write the required data in the memoryduring a single cycle due to the lack of the data bus. As a result, thebottleneck may occur so that the performance of the dual MAC structuredoes not reach to twice as much.

In case of carrying out the N complex FFT butterfly operation using theSC140, 1.5N cycles are required. The above digital signal processorsfocus on increasing the number of the operators to accelerate the FFTbutterfly operation or adjusting the data path fit for the butterflyoperation flow. However, there is a limitation to reduce the operationcycle of the butterfly with respect to the limited number of theoperators.

Assuming that two cycles are required for the butterfly operation,(N/2)log₂ N butterflies are needed for the N-point FFT. Thus, if otherinfluences are not considered, (2N/2)log₂ N cycles are needed for theN-point FFT. In fact, during the FFT operation, operation cycles may beadditionally generated for data shift or data address calculation.

Table 1 shows the comparison in the number of the butterfly operationcycle and the N-point FFT operation cycle of the Carmel DSP core and theTMS320C62x. As shown in Table 1, except for the butterfly operationcycle, additional cycles are required. In case of the Carmel DSP core,(2N/2)log₂ N cycles are needed for the butterfly operation, and in caseof the TMS320C62x, (4N/2)log₂ N) cycles are needed. TABLE 1 Number ofbutterfly operation cycle N-point FFT Carmel DSP core 2 (2N/2)log₂ N + 5N/4 + 10 log₂ N + 4 TMS320C62x 4 (4N/2)log₂ N + 7 log₂ N + N/4 + 9

FIG. 4 shows an operation of a general 8-point radix-2 DIT FFT. In caseof the N point FFT operation, there are log₂ N stages and N−1 groups.Accordingly, there are 3 stages and 7 groups in FIG. 4, and as thenumber of the stages increases, the number of the butterflies in thegroup increases or decreases.

The FFT operation is carried out in one stage and then in the nextstage. In a stage, the operation is carried out by the group. As for Cor assembly codes to implement the FFT, as shown in FIG. 5, 3 loopinginstructions are used for the operations of the stages, the groups, andthe butterflies in each group, which may vary according to thearchitectures of a programmable processor and the program. Generally, 3or 4 cycles are required to carry out the looping instruction in thedigital signal processor. Assuming that L cycles are required for asingle butterfly operation and M cycles are required to carry out thelooping instruction, the number of the cycles to carry out the N pointFFT operation can be obtained through the following formula 3.(L×N/2)log₂ N+M×(N−1)+M log₂ N+α  [Formula 3]

In formula 3, (L×N/2)log₂ N, which is determined by L, may be changedaccording to the number of the MACs and the ALUs of the digital signalprocessor, and M×(N−1)+M log₂ N, which is determined by M, may bechanged according to the configuration of a program controller of thedigital signal processor.

In the butterfly operation in a group of a stage, the address of inputdata increases by 1. Meanwhile, when the group is altered, the addressof input data of butterfly varies according the size of the group. Forthis, α is used to denote the number of the required cycles and thecycles required to the data shift. If the parallel processing isfeasible as in the VLIW processor, the number of the additionaloperation cycles, except for the butterfly, may be reduced to somedegree by parallel-processing diverse instructions through the assemblydecoding. However, the effect of the parallel processing is notsufficient. Referring to FIG. 4, the address modification according tothe alteration of the group is described by way of example. “a” in thefirst butterfly ({circle over (1)} in FIG. 4, group 1) of the stage 2 isa memory address 0 and “b” is a memory address 2. “a” in the secondbutterfly of the stage 2 ({circle over (2)} in FIG. 4, group 1) is amemory address 1, and “b” is a memory address 3. “a” in the thirdbutterfly of the stage 2 ({circle over (3)} in FIG. 4, group 2) is amemory address 4, and “b” is a memory address 6. The address of theinput data “a” in the group 1 increases from 0 by 1. Meanwhile, as theoperation is altered from the group 1 to the group 2, the address of “a”changes from 1 to 4. That is, as the group is altered, the addressincrement of the input data also changes.

As aforementioned, to reduce the number of the operation cycles of the Npoint FFT in the programmable processor such as the digital signalprocessor, it is required to minimize the additional operation cyclesexcept for the butterfly operation cycles. However, since theconventional digital processors do not support the hardware structure toreduce the additional operation cycles except for the cycles requiredfor the butterfly operations, it is difficult to reduce the number ofthe operation cycles.

SUMMARY

An aspect of the present invention is to provide a fast Fouriertransform (FFT) operating apparatus and an operation method thereof toreduce operations cycles additionally generated in a programmableprocessor except for a butterfly operation.

To achieve the above aspect of the present invention, a radix-2 complexFFT operation method to carry out a FFT operation in the programmableprocessor includes generating a start signal and applying a FFToperation signal if the FFT starts, generating an offset address of abutterfly input/output data to read a data and write an operated resultin a data memory, storing the generated offset address of the butterflyinput/output data in an offset register of a programmable processor,switching a data to provide the butterfly input data from the datamemory and write the output data in the data memory, carrying out abutterfly operation using two multiplier-accumulators, an arithmetic andlogic unit, and an exponenter, and generating a stop signal andresetting the FFT operation signal when the operation is ended. At thistime, using operation instructions SBUTTERFLY (subtract butterfly) andABUTTERFLY (add butterfly), the FFT operation apparatus carries out theFFT operation.

According to the present invention, even in the conventionalprogrammable processor in which performance is not enhanced through theacceleration of the butterfly operation, performance can be enhanced byminimizing operation cycles generated during a looping instruction, datashift, and address calculation of butterfly input data except for thebutterfly operations.

BRIEF DESCRIPTION OF THE DRAWINGS

The above aspects and other features of the present invention willbecome more apparent by describing in detail a preferred embodimentthereof with reference to the attached drawings, in which:

FIG. 1 is a view showing a structure of a DIT FFT butterfly;

FIG. 2 is a view showing a configuration of the conventional Carmel DSPcore operator by Infineon Technologies AG;

FIG. 3 is a view showing a configuration of the conventional SC140operator by Starcore™;

FIG. 4 is a flow graph showing an operation of a conventional 8-pointradix-2 DIT FFT;

FIG. 5 is a view showing a programming architecture of a FFT using alooping instruction;

FIG. 6 is a view showing a configuration of a programmable processor forFFT according to the present invention;

FIG. 7 is a flow graph showing an operation of a butterfly according tothe present invention;

FIG. 8 is a flow chart showing the generation of an offset address ofDIT butterfly data;

FIG. 9 is a view showing a configuration of an operator carrying out theoperation of FIG. 8;

FIG. 10 is a view showing a configuration of a data processor carryingout the DIT butterfly operation according to the present invention;

FIG. 11A is a view showing a configuration of a dualmultiplier-accumulator having separate 2 multiplier-accumulators;

FIG. 11B is a view showing a configuration of a dualmultiplier-accumulator using a 3-input adder;

FIG. 11C is a view showing a dual multiplier-accumulator carrying outfunctions of FIGS. 11A and 11B using a multiplexer; and

FIG. 12 is a view showing a configuration of a data bus switch of thedata processor.

DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENT

Hereinafter, the present invention will be described in detail withreference to the accompanying drawings.

FIG. 6 shows a fast Fourier transform (FFT) operating apparatus to fastoperate a N point radix-2 DIT FFT operation without generatingadditional cycles except for butterfly operations. Referring to FIG. 6,the FFT operating apparatus includes a program controller 110, a programmemory 120, a FFT address generator 130, an address generator 140, adata processor 150, a data memory 160, and a flag register 170.

The program controller 110 generates a FFT start signal and controls aprogrammable processor. The program memory 120 stores an application ofthe programmable processor. The FFT address generator 130 generates anoffset address of a FFT butterfly input data and an operation stopsignal. The address generator 140 uses the offset address generated inthe FFT address generator 130 to calculate an address of the data memory160. The data memory 160 stores data, and the data processor 150 usesthe data stored in the data memory 160 to carry out an arithmetic andlogic operation. The flag register 170 generates a FFT operation signal.

The data processor 150 includes a data bus switch circuit to receive thebutterfly input data from the data memory 160 and to write an outputdata in the data memory 160, a butterfly operation circuit having twomultiplier-accumulators to multiply and accumulate the data and onearithmetic and logic unit, an exponential operation circuit to carry outan exponential operation of the data during the butterfly operation, aninput register to store data memory values, and an accumulator to storeoperation results and reuse the stored data for the operation.

FIG. 7 is a flow graph of the butterfly operation according to thepresent invention, which shows the butterfly of FIG. 1 as a complexoperation. The complex operation is represented as the following formula4. “a” and “b” denote the butterfly input data, “c” and “d” denote thebutterfly output data, and “w” denotes a twiddle factor. Subscripts “r”and “i” respectively denote a real part and an imaginary part of eachdata.c _(r)=α_(r) +w _(r) b _(r) −w _(i) b _(i)   [Formula 4]c _(i)=α_(i) +w _(r) b _(i) +w _(i) b _(r)d _(r)=α_(r) −w _(r) b _(r) +w _(i) b _(i)d _(i)=α_(i) −w _(r) b _(i) −w _(i) b _(r)

To operate a single complex butterfly, 6 input data are required and 4output data are generated. As the operation is carried out with dividedinto 2 cycles, it is implemented using a data memory configurationcapable of reading 3 input data and writing 2 output data in a singlecycle. In a first cycle, two of the 4 input data are multiplied andsubtracted. At this time, the operation is carried out according to anoperational instruction SBUTTERFLY. In a second cycle, two of the 4input data are multiplied and added. Also, the operation is carried outaccording to an operational instruction ABUTTERFLY.

The program controller 110 controls a program of a conventionalprogrammable processor. Also, the program controller 110 decodes a FFTinstruction, transmits an N value from the N point FFT to the FFTaddress generator 130, and generates the FFT operation start signal. TheFFT address generator 130 receives the N value and the operation startsignal from the program controller 110 to generate the offset address ofthe data.

FIG. 8 shows a method to generate the offset address of the data in theFFT address generator 130, which includes starting the FFT if the FFTstart signal is ‘1’; initializing a group count, a loop count, and agroup count max value to ‘1’, respectively, a group offset value to‘−1’, a loop count max value to ‘N/2’, and an offset address value ofthe twiddle factor to ‘0’ when the FFT starts; calculating an address ofan input data A by adding the group offset and the loop count value, andan address of an input data B by adding the group offset, the loopcount, and the loop count max value; if the loop count value is notequal to the loop count max value, increasing the loop count value by 1and resuming from calculating the addresses of the input data A, B; ifthe loop count value is equal to the loop count max value, initializingthe loop count value to ‘1’, setting the group offset value with a valueobtained by multiplying the loop count max value by 2 and adding thegroup offset value, and increasing the twiddle factor by 1; if the groupcount is not equal to the group count max value, increasing the groupcount by 1 and resuming from calculating the addresses of the input dataA, B; if the group count value is equal to the group count max value,initializing the group count value to ‘1’, the group offset value to‘−1’, and the twiddle factor to ‘0’, dividing the loop count max valueby 2, and multiplying the group count max value by 2; if the group countmax value is greater than N/2, generating the operation stop signal andending the FFT operation; and, if the group count max value is notgreater than N/2, resuming from calculating the addresses of the inputdata A, B.

In order to calculate the loops of the 3 stages having a butterflyoperation loop, a group operation loop, and a stage operation loop, acomparison is carried out three times. The loop count max value and thegroup count max value respectively represent the number of thebutterflies and the number of the groups that are included in each ofthe groups and the stages. If the loop count value and the group countvalue respectively reach its max value, the operation carried out to anext group and stage. The group offset represents the addressmodification value when the group is altered.

FIG. 9 shows the configuration of the FTT address generator 130 to carryout the operations in FIG. 8. Referring to FIG. 9, the FTT addressgenerator 130 includes a logical sum logic 131, an adder 132, GR, WR,LCR, and GCR registers 133, a group counter 134, a loop counter 135, aglue logic 136, a first adder 137, a second adder 137′, a firstcomparator 138, a second comparator 138′, and a third comparator 138″.The logical sum logic 131 generates an initialization signal of aregister to store the loop count value and a register to store the groupcount value according to the start signal and a group count matchsignal. The adder 132 updates the group offset by a value obtained bymultiplying the group offset and the loop count max value by 2 andadding the multiplied value. The GR, WR, LCR, GCR registers 133 storethe group offset, the twiddle factor, the loop count max value, and thegroup count max value. The group counter 134 calculates the group countvalue, and the loop counter 135 calculates the loop count value. Theglue logic 136 consists of a logic which generates a signal toinitialize the group counter and the loop counter. The first adder 137outputs the address of the input data A by adding the group offset andthe loop counter value. The second adder 137′ outputs the address of theinput data B by adding the output from the first adder 137 and the loopcount max value. The first comparator 138 compares the loop count valueand the loop count max value, the second comparator 138′ compares thegroup counter value and the group count max value, and the thirdcomparator 138″ is input with the N value and the group count max valueand compares the group count max value and the N/2 value.

If the FTT operation start signal is applied, the loop counter 135 andthe group counter 134 are initialized to ‘1’, and GR, WR, LCR, GCRregisters 133 are initialized to ‘−1’, ‘0’, ‘N/2′, and ‘1’,respectively. If values of the loop counter 135 and the LCR register 133are identical, ‘1’ is applied to the loop count match signal. If valuesof the group counter 134 and the GCR register 133 are identical, ‘1’ isapplied to the group count match signal. The group counter 134 carriesout the counting only if the loop count match signal is ‘1’. The loopcounter 135 and the group counter 134 are re-initialized when the loopcount match signal and the group count match signal become ‘1’,respectively. The GR register 133 has a load input terminal to update aGR register value and another load input terminal to initialize. The WRregister 133 increases a WR register value by 1 if the loop count matchsignal is ‘1’, and is initialized to ‘0’ if the group count match signalis ‘1’. The WR register 133 outputs a bit-reversed value. The LCRregister 133 carries out a 1-bit right shift if the group count matchsignal becomes ‘1’. An initial value of the LCR register 133 is N/2. TheGCR register 133 carries out a 1-bit left shift every time the groupcount match signal is applied. If the GCR register value becomes N, theFFT operation stop signal is generated.

The offset address generated in the FFT address generator 130 is inputto an offset register of the programmable processor and used as anoffset for a base address. A programmable processor which is beingcurrently developed uses plural arithmetic and logic units to calculatethe address. Hence, a final data address can be calculated by using theoffset address generated in the FTT address generator 130.

FIG. 10 shows the configuration of the data processor 150 to efficientlycarry out the FFT. Referring to FIG. 10, the data processor 150 includestwo multiplier-accumulators and an arithmetic and logic unit to carryout the butterfly operation, a data bus switch circuit to control dataaccording to the operation flow, 8 input registers, and threeaccumulators. By using four multiplexers, the multiplier-accumulatoraccording to the present invention may function as two separatemultiplier-accumulators or carry out a function of adding andaccumulating two multiplied results.

FIG. 11A shows a configuration of a conventional dualmultiplier-accumulator having two separate multiplier-accumulators tooutput two accumulated results. FIG. 11B shows a configuration capableof accumulating sum of two multiplied results by using a 3-input adder.FIG. 11C shows a dual multiplier-accumulator capable of carrying out theabove conventional functions by using the multiplexer according to thepresent invention. If a selection input of the multiplexer is ‘0’, thedual multiplier-accumulator operates as in FIG. 11A, and if a selectioninput is ‘1’, the dual multiplier-accumulator operates as in FIG. 11B.Five input registers store a_(r), a_(i), b_(r), b_(i), w_(r), and w_(i),respectively. Three accumulators are required to store 2multiplier-accumulator values and one arithmetic and logic unit value.

FIG. 12 shows the data bus switch of the data processor 150. The databus switch can be implemented using six 2×1 multiplexers adapted to adata bus switch of a conventional digital signal processor withouthaving to re-design the circuit.

As aforementioned, the FFT operation method and a circuit to implementthe FFT operation method are provided to enhance performance byminimizing the operation cycles which occur in the looping instruction,the data shift, and the address calculation of the butterfly input datain addition to the butterfly operation, in the conventional programmableprocessor of which performance is not enhanced through the accelerationof the butterfly operation. Further, according to the present invention,the operating apparatus of the conventional digital signal processor canbe re-used by including the FFT address generator 130 and the switchcircuit of the data to thereby enhance the performance and facilitatethe design and the modification.

Table 2 shows the comparison between the conventional programmableprocessor and the number of the FFT operation cycles together with thenumber of the multiplier-accumulators. The configuration according tothe present invention does not generate additional operation cyclesexcept for the butterfly operation. Compared with a conventional digitalsignal processor having the same number of the multiplier-accumulators,the 256-point FFT has performance enhanced 16%˜57%.

Therefore, the FFT operating apparatus according to the presentinvention applies less hardware to the conventional programmableprocessor to thereby reduce the number of the FFT operation cycles,provide design flexibility to a FFT processor which have beenimplemented with a conventional on-demand semiconductor chip, and allowa real-time processing of an advanced telecommunication system. Numberof butterfly number Digital signal operation of processor cycles N = 256N = 1024 Formula MAC DSP1620 — 16065 — — 1 DSP56602 8 9600 49680 — 1DSP56303 — 9096 — — 1 TMS320C54x 8 8542 42098 — 1 TMS320C55x 5 4786 — —2 TMS320C62x 4 4225 20815 (4N/2)log₂ N + 7log₂ N + N/4 + 9 2 TMS320C67x— 4286 20716 (2N/2)log₂ N + 23log₂ N + 6 2 Carmel DSP 2 2452 11624(2/N)log₂ N + 5N/4 + 10log₂ N + 4 2 core Palm DSP 2 — — — 2 core Friocore 3 3176 — — 2 StarCore 1.5 — — — 4 (SC140) Configuration 2 205110243 (2N/2) log₂ N + 6 6 of the present invention

Although a few preferred embodiments of the present invention has beendescribed, it will be understood by those skilled in the art that thepresent invention should not be limited to the described preferredembodiments, but various changes and modifications can be made withinthe spirit and scope of the present invention as defined by the appendedclaims.

1. A fast Fourier transform (FFT) operating apparatus to carry out a FFToperation in a programmable processor chip, comprising: a programcontroller to generate a FFT start signal and control a programmableprocessor; a program memory to store an application of the programmableprocessor; an FFT address generator to remove a looping instruction usedfor the FFT and a cycle for an address generation, and generate anoffset address of a butterfly input data and an operation stop signal;an address generator to calculate an address of a data memory using theoffset address generated in the FFT address generator; a data memory tostore a data; a data processor to carry out an arithmetic and logicoperation using the data stored in the data memory; and a flag registerto generate a FFT operation signal.
 2. The apparatus of claim 1, whereinthe FFT address generator comprises: a logical sum logic to generateinitialization signals of a register to store a loop count value and aregister to store a group count value according to the start signal anda group count match signal; a first adder to update a group offset witha value obtained by multiplying the group offset and a loop count maxvalue by 2; GR, WR, LCR, and GCR registers to store the group offset, atwiddle factor, the loop count max value, and a group count max value; agroup counter to calculate the group count value; a loop counter tocalculate the loop count value; a glue logic having a logic whichgenerates a signal to initialize the group counter and the loop counter;a second adder to add the group offset and the loop count value andoutput a single input data address; a third adder to input with and addthe second adder and the loop count max value and output another inputdata address; a first comparator to compare a value of the loop counterand the loop count max value; a second comparator to compare a value ofthe group counter and the group count max value; and a third comparatorto input with a N value and the group count max value and compare thegroup count max value with a N/2 value.
 3. The apparatus of claim 1,wherein the data processor comprises: a data bus switch circuit toprovide the butterfly input data from the data memory and write anoutput data in the data memory; a butterfly operation circuit having twomultiplier-accumulators to multiply and accumulate a data and onearithmetic and logic unit; an exponential operation circuit to carry outan exponential operation of a data in the butterfly operation; an inputregister to store a value of the data memory; and an accumulator tostore an operation result and re-use the stored value for the operation.4. A radix-2 complex fast Fourier transform (FFT) operation method tocarry out a FFT operation in a programmable processor chip, comprising:generating a start signal and applying a FFT operation signal if the FFTstarts; generating an offset address of a butterfly input/output data toread a data and write an operated result in a data memory; storing thegenerated offset address of the butterfly input/output data in an offsetregister of a programmable processor; switching a data to provide thebutterfly input data from the data memory and write the output data inthe data memory; carrying out a butterfly operation using twomultiplier-accumulators, an arithmetic and logic unit, and anexponenter; and generating a stop signal and resetting the FFT operationsignal when the operation is ended.
 5. The method of claim 4, whereinoperation instructions SBUTTERFLY and ABUTTERFLY are used for the FFToperation.
 6. The method of claim 4, wherein generating the offsetaddress by a FTT address generator comprises: starting the FFT if theFFT start signal is ‘1’; initializing a group count, a loop count, and agroup count max value to ‘1’, respectively, a group offset value to‘−1’, a loop count max value to ‘N/2’, and an offset address value of atwiddle factor to ‘0’ if the FFT starts; calculating an input data byadding the group offset and the loop count values and calculatinganother input data by adding the group offset, the loop count, and theloop count max values; increasing the loop count value by 1 if the loopcount value is not equal to the loop count max value and resuming fromcalculating the two input data addresses; initializing the loop countvalue to ‘1’, setting the group offset value with a value obtained bymultiplying the loop count max value by 2 and adding the group offsetvalue to the multiplied value, and increasing the twiddle factor valueby 1 if the loop count value is equal to the loop count max value;increasing the group count value by 1 and resuming from calculating thetwo input data addresses if the group count is not equal to the groupcount max value; initializing the group count value to ‘1’, the groupoffset value to ‘−1’, and the twiddle factor value to ‘0’, dividing theloop count max value by 2, and multiplying the group count max value by2 if the group count value is equal to the group count max value;generating the operation stop signal and ending the FFT operation if thegroup count max value is greater than N/2; and resuming from calculatingthe two input data addresses if the group count max value is not greaterthan N/2.